Goto

Collaborating Authors

 action profile


Solving Graph-based Public Good Games with Tree Search and Imitation Learning

Neural Information Processing Systems

Public goods games represent insightful settings for studying incentives for individual agents to make contributions that, while costly for each of them, benefit the wider society. In this work, we adopt the perspective of a central planner with a global view of a network of self-interested agents and the goal of maximizing some desired property in the context of a best-shot public goods game. Existing algorithms for this known NP-complete problem find solutions that are sub-optimal and cannot optimize for criteria other than social welfare. In order to efficiently solve public goods games, our proposed method directly exploits the correspondence between equilibria and the Maximal Independent Set (mIS) structural property of graphs. In particular, we define a Markov Decision Process which incrementally generates an mIS, and adopt a planning method to search for equilibria, outperforming existing methods. Furthermore, we devise a graph imitation learning technique that uses demonstrations of the search to obtain a graph neural network parametrized policy which quickly generalizes to unseen game instances. Our evaluation results show that this policy is able to reach 99.5% of the performance of the planning method while being three orders of magnitude faster to evaluate on the largest graphs tested. The methods presented in this work can be applied to a large class of public goods games of potentially high societal impact and more broadly to other graph combinatorial optimization problems.



Queue Up Your Regrets: Achieving the Dynamic Capacity Region of Multiplayer Bandits

Neural Information Processing Systems

Consider N cooperative agents such that for T turns, each agent n takes an action an and receives a stochastic reward rn (a1,...,aN). Agents cannot observe the actions of other agents and do not know even their own reward function. The agents can communicate with their neighbors on a connected graph Gwith diameter d(G). We want each agent nto achieve an expected average reward of at least ฮปn over time, for a given quality of service (QoS) vector ฮป. AQoS vector ฮปis not necessarily achievable.



Your Title

Neural Information Processing Systems

Consider a team of cooperative players that take actions in a networkedenvironment. At each turn, each player chooses an action and receives a reward that is an unknown function of all the players' actions. The goal of the team of players is to learn to play together the action profile that maximizes the sum of their rewards. However, players cannot observe the actions or rewards of other players, and can only get this information by communicating with their neighbors.


056e8e9c8ca9929cb6cf198952bf1dbb-Supplemental-Conference.pdf

Neural Information Processing Systems

This search does not affect the computational complexity, which is O(ฮฝnDE +SE) for agent n that computes DE parallel consensus steps and goes over a listofSE actionprofiles. Intuitively,wewouldneedE KN tofindtheoptimalactionprofile even with no noise, which creates delays where agents have to wait for their average reward to go abovetheirฮปn. In the multitasking robots game, if agent n has Ren = 0, then theoptimalactionprofilea e hastosatisfya e,m = nforallm. Ifฮปisasafemarginawayfromthe boundary of C(G), then most agents will have Ren = 0 most of the time. Hence, their performance depends on the best action profile in SE.



Cooperative Multi-player Bandit Optimization

Neural Information Processing Systems

Consider a team of cooperative players that take actions in a networked-environment. At each turn, each player chooses an action and receives a reward that is an unknown function of all the players' actions. The goal of the team of players is to learn to play together the action profile that maximizes the sum of their rewards. However, players cannot observe the actions or rewards of other players, and can only get this information by communicating with their neighbors. We design a distributed learning algorithm that overcomes the informational bias players have towards maximizing the rewards of nearby players they got more information about. We assume twice continuously differentiable reward functions and constrained convex and compact action sets. Our communication graph is a random time-varying graph that follows an ergodic Markov chain. We prove that even if at every turn players take actions based only on the small random subset of the players' rewards that they know, our algorithm converges with probability 1 to the set of stationary points of (projected) gradient ascent on the sum of rewards function. Hence, if the sum of rewards is concave, then the algorithm converges with probability 1 to the optimal action profile.


Bayesian Optimization for Non-Cooperative Game-Based Radio Resource Management

arXiv.org Artificial Intelligence

Radio resource management in modern cellular networks often calls for the optimization of complex utility functions that are potentially conflicting between different base stations (BSs). Coordinating the resource allocation strategies efficiently across BSs to ensure stable network service poses significant challenges, especially when each utility is accessible only via costly, black-box evaluations. This paper considers formulating the resource allocation among spectrum sharing BSs as a non-cooperative game, with the goal of aligning their allocation incentives toward a stable outcome. To address this challenge, we propose PPR-UCB, a novel Bayesian optimization (BO) strategy that learns from sequential decision-evaluation pairs to approximate pure Nash equilibrium (PNE) solutions. PPR-UCB applies martingale techniques to Gaussian process (GP) surrogates and constructs high probability confidence bounds for utilities uncertainty quantification. Experiments on downlink transmission power allocation in a multi-cell multi-antenna system demonstrate the efficiency of PPR-UCB in identifying effective equilibrium solutions within a few data samples.


Aspiration-based Perturbed Learning Automata in Games with Noisy Utility Measurements. Part A: Stochastic Stability in Non-zero-Sum Games

arXiv.org Artificial Intelligence

Reinforcement-based learning has attracted considerable attention both in modeling human behavior as well as in engineering, for designing measurement- or payoff-based optimization schemes. Such learning schemes exhibit several advantages, especially in relation to filtering out noisy observations. However, they may exhibit several limitations when applied in a distributed setup. In multi-player weakly-acyclic games, and when each player applies an independent copy of the learning dynamics, convergence to (usually desirable) pure Nash equilibria cannot be guaranteed. Prior work has only focused on a small class of games, namely potential and coordination games. To address this main limitation, this paper introduces a novel payoff-based learning scheme for distributed optimization, namely aspiration-based perturbed learning automata (APLA). In this class of dynamics, and contrary to standard reinforcement-based learning schemes, each player's probability distribution for selecting actions is reinforced both by repeated selection and an aspiration factor that captures the player's satisfaction level. We provide a stochastic stability analysis of APLA in multi-player positive-utility games under the presence of noisy observations. This is the first part of the paper that characterizes stochastic stability in generic non-zero-sum games by establishing equivalence of the induced infinite-dimensional Markov chain with a finite dimensional one. In the second part, stochastic stability is further specialized to weakly acyclic games.